Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Recheck if remote device is cached before requesting it #16252

Merged
merged 4 commits into from
Sep 7, 2023

Conversation

erikjohnston
Copy link
Member

This fixes a bug where we could get stuck re-requesting the device over replication again and again.

This fixes a bug where we could get stuck re-requesting the device over
replication again and again.
@erikjohnston erikjohnston marked this pull request as ready for review September 5, 2023 13:10
@erikjohnston erikjohnston requested a review from a team as a code owner September 5, 2023 13:10
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is OK, but...I have questions.

user_result, user_failed = await self._user_device_resync_returning_failed(
user_id
)
async with self._resync_linearizer.queue(user_id):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is queueing the right thing here? Should we instead share the results via an ObservableDeferred? (If you have two requests for the same user in a row there's no reason to service the second if the first is ongoing?)

Edit: It looks like this is essentially what we're doing by checking the cache inside of _user_device_resync_returning_failed?


Will this make it harder to batch these? (Maybe not the immediate concern, but don't want to design ourselves into a corner.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these weren't device lists I would be more gung ho about batching these up. However, we need to make sure that we don't break any semantics. Specifically I'm worried about the situations where we get two requests for a remote users device lists, one that happens before we get poked about a new device and one after, if we coalesce those two remote calls we need to make sure that we don't return the old set of devices (retrieved by the first request) to the second request. IYSWIM....

I don't think this backs us in to a corner, we can always change this down the line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're not talking about the same batching -- I was talking about batching multiple users (based on the comment a few lines above this), but maybe that's tricky.

Comment on lines +1307 to +1311
# Check that we haven't gone and fetched the devices since we last
# checked if we needed to resync these device lists.
if await self.store.get_users_whose_devices_are_cached([user_id]):
cached = await self.store.get_cached_devices_for_user(user_id)
return cached, False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is safe because if we linearized on user ID, then request the same user again but have not received a request to invalidate that data (via a resync request) then it must still be valid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, we already do do this check before entering this block, so this is safe. The flow is:

  1. Client reader receives a requests for keys.
  2. Client reader checks if keys are cached, if so returns them.
  3. If not, client reader queries master to fetch the keys.
  4. New Master linearizes requests for a users keys
  5. New Master checks if we now have the keys in the cache, if so returns.
  6. Fetches keys from the remote server, and caches them if we share a room with the remote user.

So this is step 5, which is safe because we're basically re-doing a check we have previously done

@erikjohnston erikjohnston requested a review from clokep September 7, 2023 09:37
Copy link
Member

@clokep clokep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation. Looks like you need to merge in develop though!

@erikjohnston erikjohnston enabled auto-merge (squash) September 7, 2023 12:34
@erikjohnston erikjohnston merged commit 1cd410a into develop Sep 7, 2023
37 checks passed
@erikjohnston erikjohnston deleted the erikj/recheck_if_device_is_cached branch September 7, 2023 12:45
yingziwu added a commit to yingziwu/synapse that referenced this pull request Oct 3, 2023
No significant changes since 1.93.0rc1.

The following issues are fixed in 1.93.0 (and RCs).

- [GHSA-4f74-84v3-j9q5](GHSA-4f74-84v3-j9q5) / [CVE-2023-41335](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-41335) — Low Severity

  Temporary storage of plaintext passwords during password changes.

- [GHSA-7565-cq32-vx2x](GHSA-7565-cq32-vx2x) / [CVE-2023-42453](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-42453) — Low Severity

  Improper validation of receipts allows forged read receipts.

See the advisories for more details. If you have any questions, email security@matrix.org.

- Add automatic purge after all users have forgotten a room. ([\matrix-org#15488](matrix-org#15488))
- Restore room purge/shutdown after a Synapse restart. ([\matrix-org#15488](matrix-org#15488))
- Support resolving homeservers using `matrix-fed` DNS SRV records from [MSC4040](matrix-org/matrix-spec-proposals#4040). ([\matrix-org#16137](matrix-org#16137))
- Add the ability to use `G` (GiB) and `T` (TiB) suffixes in configuration options that refer to numbers of bytes. ([\matrix-org#16219](matrix-org#16219))
- Add span information to requests sent to appservices. Contributed by MTRNord. ([\matrix-org#16227](matrix-org#16227))
- Add the ability to enable/disable registrations when using CAS. Contributed by Aurélien Grimpard. ([\matrix-org#16262](matrix-org#16262))
- Allow the `/notifications` endpoint to be routed to workers. ([\matrix-org#16265](matrix-org#16265))
- Enable users to easily unsubscribe to notifications emails via the `List-Unsubscribe` header. ([\matrix-org#16274](matrix-org#16274))
- Report whether a user is `locked` in the [List Accounts admin API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#list-accounts), and exclude locked users by default. ([\matrix-org#16328](matrix-org#16328))

- Fix a long-standing bug where multi-device accounts could cause high load due to presence. ([\matrix-org#16066](matrix-org#16066), [\matrix-org#16170](matrix-org#16170), [\matrix-org#16171](matrix-org#16171), [\matrix-org#16172](matrix-org#16172), [\matrix-org#16174](matrix-org#16174))
- Fix a long-standing bug where appservices using [MSC2409](matrix-org/matrix-spec-proposals#2409) to receive `to_device` messages would only get messages for one user. ([\matrix-org#16251](matrix-org#16251))
- Fix bug when using workers where Synapse could end up re-requesting the same remote device repeatedly. ([\matrix-org#16252](matrix-org#16252))
- Fix long-standing bug where we kept re-requesting a remote server's key repeatedly, potentially causing delays in receiving events over federation. ([\matrix-org#16257](matrix-org#16257))
- Avoid temporary storage of sensitive information. ([\matrix-org#16272](matrix-org#16272))
- Fix bug introduced in Synapse 1.49.0 when using dehydrated devices ([MSC2697](matrix-org/matrix-spec-proposals#2697)) and refresh tokens. Contributed by Hanadi. ([\matrix-org#16288](matrix-org#16288))
- Fix a long-standing bug where invalid receipts would be accepted. ([\matrix-org#16327](matrix-org#16327))
- Use standard name for UTF-8 charset in emails. ([\matrix-org#16329](matrix-org#16329))
- Don't try refetching device lists for users on remote hosts that are marked as "down". ([\matrix-org#16298](matrix-org#16298))

- Fix typos in the documentation. ([\matrix-org#16282](matrix-org#16282))
- Link to the Alpine Linux community package for Synapse. ([\matrix-org#16304](matrix-org#16304))
- Use string for `federation_client_minimum_tls_version` documentation examples. Contributed by @jcgruenhage. ([\matrix-org#16353](matrix-org#16353))

- Allow modules to delete rooms. ([\matrix-org#15997](matrix-org#15997))
- Add GCC and GNU Make to the Nix flake development environment so that `ruff` can be compiled. ([\matrix-org#16090](matrix-org#16090), [\matrix-org#16263](matrix-org#16263))
- Fix type checking when using the new version of Twisted. ([\matrix-org#16235](matrix-org#16235))
- Delete device messages asynchronously and in staged batches using the task scheduler. ([\matrix-org#16240](matrix-org#16240), [\matrix-org#16311](matrix-org#16311), [\matrix-org#16312](matrix-org#16312), [\matrix-org#16313](matrix-org#16313))
- Bump minimum supported Rust version to 1.61.0. ([\matrix-org#16248](matrix-org#16248))
- Update rust to version 1.71.1 in the nix development environment. ([\matrix-org#16260](matrix-org#16260))
- Simplify server key storage. ([\matrix-org#16261](matrix-org#16261))
- Reduce CPU overhead of change password endpoint. ([\matrix-org#16264](matrix-org#16264))
- Stop purging from tables slated for removal. ([\matrix-org#16273](matrix-org#16273))
- Improve type hints. ([\matrix-org#16276](matrix-org#16276), [\matrix-org#16301](matrix-org#16301), [\matrix-org#16325](matrix-org#16325), [\matrix-org#16326](matrix-org#16326))
- Raise `setuptools_rust` version cap to 1.7.0. ([\matrix-org#16277](matrix-org#16277))
- Fix using the new task scheduler causing lots of CPU to be used. ([\matrix-org#16278](matrix-org#16278))
- Upgrade CI run of Python 3.12 from rc1 to rc2. ([\matrix-org#16280](matrix-org#16280))
- Include values in SQL debug when using `execute_values` with Postgres. ([\matrix-org#16281](matrix-org#16281))
- Enable additional linting checks. ([\matrix-org#16283](matrix-org#16283))
- Refactor `receipts_graph` Postgres transactions to stop error messages. ([\matrix-org#16299](matrix-org#16299))
- Small improvements to logging in replication code. ([\matrix-org#16309](matrix-org#16309))
- Remove a reference cycle in background processes. ([\matrix-org#16314](matrix-org#16314))
- Only use literal strings for background process names. ([\matrix-org#16315](matrix-org#16315))
- Refactor `get_user_by_id`. ([\matrix-org#16316](matrix-org#16316))
- Speed up task to delete to-device messages. ([\matrix-org#16318](matrix-org#16318))
- Avoid patching code in tests. ([\matrix-org#16349](matrix-org#16349))
- Test against PostgreSQL 16. ([\matrix-org#16351](matrix-org#16351))

* Bump mypy from 1.4.1 to 1.5.1. ([\matrix-org#16300](matrix-org#16300))
* Bump black from 23.7.0 to 23.9.1. ([\matrix-org#16295](matrix-org#16295))
* Bump docker/build-push-action from 4 to 5. ([\matrix-org#16336](matrix-org#16336))
* Bump docker/login-action from 2 to 3. ([\matrix-org#16339](matrix-org#16339))
* Bump docker/metadata-action from 4 to 5. ([\matrix-org#16337](matrix-org#16337))
* Bump docker/setup-qemu-action from 2 to 3. ([\matrix-org#16338](matrix-org#16338))
* Bump furo from 2023.8.19 to 2023.9.10. ([\matrix-org#16340](matrix-org#16340))
* Bump gitpython from 3.1.32 to 3.1.35. ([\matrix-org#16267](matrix-org#16267), [\matrix-org#16279](matrix-org#16279))
* Bump mypy-zope from 1.0.0 to 1.0.1. ([\matrix-org#16291](matrix-org#16291))
* Bump pillow from 10.0.0 to 10.0.1. ([\matrix-org#16344](matrix-org#16344))
* Bump regex from 1.9.4 to 1.9.5. ([\matrix-org#16233](matrix-org#16233))
* Bump ruff from 0.0.286 to 0.0.290. ([\matrix-org#16342](matrix-org#16342))
* Bump serde_json from 1.0.105 to 1.0.107. ([\matrix-org#16296](matrix-org#16296), [\matrix-org#16345](matrix-org#16345))
* Bump twisted from 22.10.0 to 23.8.0. ([\matrix-org#16235](matrix-org#16235))
* Bump types-pillow from 10.0.0.2 to 10.0.0.3. ([\matrix-org#16293](matrix-org#16293))
* Bump types-setuptools from 68.0.0.3 to 68.2.0.0. ([\matrix-org#16292](matrix-org#16292))
* Bump typing-extensions from 4.7.1 to 4.8.0. ([\matrix-org#16341](matrix-org#16341))
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Oct 27, 2023
No significant changes since 1.93.0rc1.

The following issues are fixed in 1.93.0 (and RCs).

- [GHSA-4f74-84v3-j9q5](GHSA-4f74-84v3-j9q5) / [CVE-2023-41335](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-41335) — Low Severity

  Temporary storage of plaintext passwords during password changes.

- [GHSA-7565-cq32-vx2x](GHSA-7565-cq32-vx2x) / [CVE-2023-42453](https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2023-42453) — Low Severity

  Improper validation of receipts allows forged read receipts.

See the advisories for more details. If you have any questions, email security@matrix.org.

- Add automatic purge after all users have forgotten a room. ([\matrix-org#15488](matrix-org#15488))
- Restore room purge/shutdown after a Synapse restart. ([\matrix-org#15488](matrix-org#15488))
- Support resolving homeservers using `matrix-fed` DNS SRV records from [MSC4040](matrix-org/matrix-spec-proposals#4040). ([\matrix-org#16137](matrix-org#16137))
- Add the ability to use `G` (GiB) and `T` (TiB) suffixes in configuration options that refer to numbers of bytes. ([\matrix-org#16219](matrix-org#16219))
- Add span information to requests sent to appservices. Contributed by MTRNord. ([\matrix-org#16227](matrix-org#16227))
- Add the ability to enable/disable registrations when using CAS. Contributed by Aurélien Grimpard. ([\matrix-org#16262](matrix-org#16262))
- Allow the `/notifications` endpoint to be routed to workers. ([\matrix-org#16265](matrix-org#16265))
- Enable users to easily unsubscribe to notifications emails via the `List-Unsubscribe` header. ([\matrix-org#16274](matrix-org#16274))
- Report whether a user is `locked` in the [List Accounts admin API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html#list-accounts), and exclude locked users by default. ([\matrix-org#16328](matrix-org#16328))

- Fix a long-standing bug where multi-device accounts could cause high load due to presence. ([\matrix-org#16066](matrix-org#16066), [\matrix-org#16170](matrix-org#16170), [\matrix-org#16171](matrix-org#16171), [\matrix-org#16172](matrix-org#16172), [\matrix-org#16174](matrix-org#16174))
- Fix a long-standing bug where appservices using [MSC2409](matrix-org/matrix-spec-proposals#2409) to receive `to_device` messages would only get messages for one user. ([\matrix-org#16251](matrix-org#16251))
- Fix bug when using workers where Synapse could end up re-requesting the same remote device repeatedly. ([\matrix-org#16252](matrix-org#16252))
- Fix long-standing bug where we kept re-requesting a remote server's key repeatedly, potentially causing delays in receiving events over federation. ([\matrix-org#16257](matrix-org#16257))
- Avoid temporary storage of sensitive information. ([\matrix-org#16272](matrix-org#16272))
- Fix bug introduced in Synapse 1.49.0 when using dehydrated devices ([MSC2697](matrix-org/matrix-spec-proposals#2697)) and refresh tokens. Contributed by Hanadi. ([\matrix-org#16288](matrix-org#16288))
- Fix a long-standing bug where invalid receipts would be accepted. ([\matrix-org#16327](matrix-org#16327))
- Use standard name for UTF-8 charset in emails. ([\matrix-org#16329](matrix-org#16329))
- Don't try refetching device lists for users on remote hosts that are marked as "down". ([\matrix-org#16298](matrix-org#16298))

- Fix typos in the documentation. ([\matrix-org#16282](matrix-org#16282))
- Link to the Alpine Linux community package for Synapse. ([\matrix-org#16304](matrix-org#16304))
- Use string for `federation_client_minimum_tls_version` documentation examples. Contributed by @jcgruenhage. ([\matrix-org#16353](matrix-org#16353))

- Allow modules to delete rooms. ([\matrix-org#15997](matrix-org#15997))
- Add GCC and GNU Make to the Nix flake development environment so that `ruff` can be compiled. ([\matrix-org#16090](matrix-org#16090), [\matrix-org#16263](matrix-org#16263))
- Fix type checking when using the new version of Twisted. ([\matrix-org#16235](matrix-org#16235))
- Delete device messages asynchronously and in staged batches using the task scheduler. ([\matrix-org#16240](matrix-org#16240), [\matrix-org#16311](matrix-org#16311), [\matrix-org#16312](matrix-org#16312), [\matrix-org#16313](matrix-org#16313))
- Bump minimum supported Rust version to 1.61.0. ([\matrix-org#16248](matrix-org#16248))
- Update rust to version 1.71.1 in the nix development environment. ([\matrix-org#16260](matrix-org#16260))
- Simplify server key storage. ([\matrix-org#16261](matrix-org#16261))
- Reduce CPU overhead of change password endpoint. ([\matrix-org#16264](matrix-org#16264))
- Stop purging from tables slated for removal. ([\matrix-org#16273](matrix-org#16273))
- Improve type hints. ([\matrix-org#16276](matrix-org#16276), [\matrix-org#16301](matrix-org#16301), [\matrix-org#16325](matrix-org#16325), [\matrix-org#16326](matrix-org#16326))
- Raise `setuptools_rust` version cap to 1.7.0. ([\matrix-org#16277](matrix-org#16277))
- Fix using the new task scheduler causing lots of CPU to be used. ([\matrix-org#16278](matrix-org#16278))
- Upgrade CI run of Python 3.12 from rc1 to rc2. ([\matrix-org#16280](matrix-org#16280))
- Include values in SQL debug when using `execute_values` with Postgres. ([\matrix-org#16281](matrix-org#16281))
- Enable additional linting checks. ([\matrix-org#16283](matrix-org#16283))
- Refactor `receipts_graph` Postgres transactions to stop error messages. ([\matrix-org#16299](matrix-org#16299))
- Small improvements to logging in replication code. ([\matrix-org#16309](matrix-org#16309))
- Remove a reference cycle in background processes. ([\matrix-org#16314](matrix-org#16314))
- Only use literal strings for background process names. ([\matrix-org#16315](matrix-org#16315))
- Refactor `get_user_by_id`. ([\matrix-org#16316](matrix-org#16316))
- Speed up task to delete to-device messages. ([\matrix-org#16318](matrix-org#16318))
- Avoid patching code in tests. ([\matrix-org#16349](matrix-org#16349))
- Test against PostgreSQL 16. ([\matrix-org#16351](matrix-org#16351))

* Bump mypy from 1.4.1 to 1.5.1. ([\matrix-org#16300](matrix-org#16300))
* Bump black from 23.7.0 to 23.9.1. ([\matrix-org#16295](matrix-org#16295))
* Bump docker/build-push-action from 4 to 5. ([\matrix-org#16336](matrix-org#16336))
* Bump docker/login-action from 2 to 3. ([\matrix-org#16339](matrix-org#16339))
* Bump docker/metadata-action from 4 to 5. ([\matrix-org#16337](matrix-org#16337))
* Bump docker/setup-qemu-action from 2 to 3. ([\matrix-org#16338](matrix-org#16338))
* Bump furo from 2023.8.19 to 2023.9.10. ([\matrix-org#16340](matrix-org#16340))
* Bump gitpython from 3.1.32 to 3.1.35. ([\matrix-org#16267](matrix-org#16267), [\matrix-org#16279](matrix-org#16279))
* Bump mypy-zope from 1.0.0 to 1.0.1. ([\matrix-org#16291](matrix-org#16291))
* Bump pillow from 10.0.0 to 10.0.1. ([\matrix-org#16344](matrix-org#16344))
* Bump regex from 1.9.4 to 1.9.5. ([\matrix-org#16233](matrix-org#16233))
* Bump ruff from 0.0.286 to 0.0.290. ([\matrix-org#16342](matrix-org#16342))
* Bump serde_json from 1.0.105 to 1.0.107. ([\matrix-org#16296](matrix-org#16296), [\matrix-org#16345](matrix-org#16345))
* Bump twisted from 22.10.0 to 23.8.0. ([\matrix-org#16235](matrix-org#16235))
* Bump types-pillow from 10.0.0.2 to 10.0.0.3. ([\matrix-org#16293](matrix-org#16293))
* Bump types-setuptools from 68.0.0.3 to 68.2.0.0. ([\matrix-org#16292](matrix-org#16292))
* Bump typing-extensions from 4.7.1 to 4.8.0. ([\matrix-org#16341](matrix-org#16341))

# -----BEGIN PGP SIGNATURE-----
#
# iQFEBAABCgAuFiEEBTGR3/RnAzBGUif3pULk7RsPrAkFAmUS8iEQHGVyaWtAbWF0
# cml4Lm9yZwAKCRClQuTtGw+sCXFgB/912+T+BydS290UECCXp9kpRB5xo3aWe8mX
# NCx9Oor1TRLBpLhlQWk786gP1Q9JAQpmA4z6kovjKaLG1b4oLbZNjbPG4hEYc8ow
# /rVzGor52pfyS7uS5GW+rRmapcw4AYND6hA9XGELupf2joC8LXioSCEVG4cxwD8E
# IgIbLc87C7KpaUkNbDEz3jzZ3/BVRGcIYyhF3zTK2ZApvH2qsegq8wKYx4EYJnfh
# 87DXtTCNwA+bW6XZYPtUwPKjZ+TGB11IizxmQySGLbAxvH+GUan8X8TizGyxaqaA
# FDk3yMBbUo0R7ljDgL5YsZXT6qsZz+IBz/bsMzSbZ39f/yEUqHak
# =1/pL
# -----END PGP SIGNATURE-----
# gpg: Signature made Tue Sep 26 16:00:49 2023 BST
# gpg:                using RSA key 053191DFF4670330465227F7A542E4ED1B0FAC09
# gpg:                issuer "erik@matrix.org"
# gpg: Can't check signature: No public key

# Conflicts:
#	.github/workflows/docker.yml
#	.github/workflows/push_complement_image.yml
#	.github/workflows/release-artifacts.yml
#	.github/workflows/tests.yml
#	poetry.lock
#	synapse/appservice/scheduler.py
#	synapse/handlers/pagination.py
#	synapse/handlers/room.py
#	synapse/rest/client/account_data.py
#	tests/rest/client/test_receipts.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants